Can Models Learned from a Dataset Reflect Acquisition of Procedural Knowledge? An Experiment with Automatic Measurement of Online Review Quality

نویسندگان

  • Martina Megasari
  • Pandu Wicaksono
  • Chiao Yun Li
  • Clément Chaussade
  • Shibo Cheng
  • Nicolas Labroche
  • Patrick Marcel
  • Verónika Peralta
چکیده

Can models learned from a dataset reflect how good are humans at mastering a particular skill? This paper studies this question in the context of online reviews writing, where the skill corresponds to the procedural knowledge needed to write helpful reviews. To this end, we model the quality of a review by a combination of various metrics stemming from text analysis (like readability, polarity, spelling errors or length) and we use customer declared helpfulness as a ground truth for constructing the model. We use Knowledge Tracing, a popular model of skill acquisition, to measure the evolution of the ability to write reviews of good quality over a period of time. While recent studies have tried to measure the quality of a review and correlate it to helpfulness, to the best of our knowledge, our work is the first to address this question as the exercise of a reviewer’s skill over a sequence of reviews. Our experiments on a set of 41,681 Amazon book reviews show that it is possible to accurately assess the individual skill acquisition of writing a helpful review, based on a statistical model of the procedural knowledge at hand rather than human evaluations prone to subjectivity and variations over time.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering of nasopharyngeal carcinoma intensity modulated radiation therapy plans based on k-means algorithm and geometrical features

Background: The design of intensity modulated radiation therapy (IMRT) plans is difficult and time-consuming. The retrieval of similar IMRT plans from the IMRT plan dataset can effectively improve the quality and efficiency of IMRT plans and automate the design of IMRT planning. However, the large IMRT plans datasets will bring inefficient retrieval result. Materials and Methods: An intensity-m...

متن کامل

Automatic Identification and Classification of the Iranian Traditional Music Scales (Dastgāh) and Melody Models (Gusheh): Analytical and Comparative Review on Conducted Research

Background and Aim: Automatic identification and classification of the Iranian traditional music scales (Dastgāh) and melody models (Gusheh) has attracted the attention of the researchers for more than a decade. The current research aims to review conducted researches on this area and consider its different approached and obstacles. Method: The research approach is content analysis and data col...

متن کامل

Automatic segmentation of glioma tumors from BraTS 2018 challenge dataset using a 2D U-Net network

Background: Glioma is the most common primary brain tumor, and early detection of tumors is important in the treatment planning for the patient. The precise segmentation of the tumor and intratumoral areas on the MRI by a radiologist is the first step in the diagnosis, which, in addition to the consuming time, can also receive different diagnoses from different physicians. The aim of this study...

متن کامل

P5: A Review of Memory Cognitive Function in Patients with Posttraumatic Stress Disorder (PTSD)

Memory impairment is one of the main features of post-traumatic stress disorder (PTSD),There are multiple studies in memory impairment and cognitive function such as memory in a variety of explicit memory, implicit, procedural, active, declarative, revermid ,working, visual, false and autobiographical. The methodology of systematic review were, meta-analyzes and controlled studies of sites Med ...

متن کامل

Real-time quality monitoring in debutanizer column with regression tree and ANFIS

A debutanizer column is an integral part of any petroleum refinery. Online composition monitoring of debutanizer column outlet streams is highly desirable in order to maximize the production of liquefied petroleum gas. In this article, data-driven models for debutanizer column are developed for real-time composition monitoring. The dataset used has seven process variables as inputs and the outp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018